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[57] ABSTRACT 

A method, apparatus, and article of manufacture for a 
computer implemented recover/build index system. The 
recover/build index system builds a database index for a 
database file by scanning partitions of the database file in 
parallel to retrieve key values and their associated record 
identifier (rid) values. The recover/build index system then 
sorts the scanned key/rid values for each partition in parallel. 
Next, the recover/build index system performs one or more 
merges on the sorted key/rid values from all of the partitions 
to generate a single key/rid value stream. Finally, the 
recover/build index system builds the index using the single 
key/rid value stream. 

13 Claims, 4 Drawing Sheets 
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HIGH PERFORMANCE RECOVER/BUILD form a part hereof, and which is shown by way of illustration 

a specific embodiment in which the invention may be 
practiced. It is to be understood that other embodiments may 
be utilized as structural changes may be made without 
BACKGROUND OF THE INVENTION 5 departing from the scope of the present invent i 0 n. 



Hardware Environment 



1. Field of the Invention 

This invention relates in general to computei 

implemented recover/build index systems, and in particular FIG. 1 is a block diagram illustrating an exemplary 

to high performance recover/build index systems by unload- hardware environment used to implement the preferred 
ing database files in parallel. 10 embodiment of the invention. In the exemplary 

2. Description of Related Art environment, a computer system 100 is comprised of one or 
An index is an ordered set of references to the records or more processors 102 coupled via an interconnect 104. One 



a database file or [able. The index is used U 



; peripheral devices 106, including fixed and/or 

each^cordln7h7fiTe^ 1C removable data storage devices such as a hard disk, floppy 

the record or attributes oi the row). However, building an 13 disk > CD-ROM, tape, etc., may be coupled to each of 

index for a large file can take a considerable amount of processors 102. 

elapsed time. The process involves scanning all records in The present invention is typically implemented using a 

the file, extracting a key value and record identifier (rid) number of computer programs executed in parallel by the 

value from each of the records, sorting all of the key/rid processors 102, including scan programs 108, sort programs 

values, and then building the index from the sorted key/rid HO, merge programs 112, and index build programs 114. 

values. Typically, the scanning, sorting, and index build Using these computer programs, the present invention builds 

steps are performed serially, which can be time consuming an index 116 for a database file 118 having one or more 

in the case of a large database file. partitions 120, all of which are stored in one or more of the 

Additionally, when computer systems fail, the index could data storage devices 106. Preferably, each of the partitions 

be corrupted or destroyed. In this case, recovery of the 120 is scanned in parallel by the scan programs 108 and the 

index, which involves rebuilding the index, can be very time scanning results are then sorted in parallel by the sort 

consuming. Therefore, there is a need in the art for tech- programs 110, in order to enhance the performance of the 

niques that buiklina indices more efficiently. system. 

„ The scan 108, sort 110, merge 112, and index build 114 

SUMMARY OF THE INVENTION 30 computer programs all execu f e under the control of an 

To overcome the limitations in the prior art described operating system, such as MVS, AIX, OS/2, WINDOWS 
above, and to overcome other limitations that will become NT, WINDOWS, UNIX, etc. Further, the scan 108, sort 110, 
apparent upon reading and understanding the present merge 112, and index build 114 computer programs are all 
specification, the present invention discloses a method, 35 tangibly embodied in or readable from a computer-readable 
apparatus, and article of manufacture for a computer imple- medium, e.g. one or more of the data storage devices 106 
mented recover/build index system. In accordance with the and/or data communications devices coupled to the corn- 
present invention, the recover/build index system builds an puter system 100. Moreover, the scan 108, sort 110, merge 
index for a file by scanning partitions of the file in parallel 112, and index build 114 computer programs are all corn- 
to retrieve key/rid values. The recover/build index system 40 prised of instructions which, when read and executed by the 
then sorts the scanned key/rid values for each partition in processors 102, cause the processors 102 to perform the 
parallel. Next, the recover/build index system performs one steps necessary to implement and/or use the present inven- 
or more merges on the sorted key/rid values from all of the tion. 

partitions to generate a single key value stream. Finally, the Those skilled in the art will recognize that any combina- 

recover/build index system builds the index using the single 45 tion of the above components, or any number of different 

key/rid value stream. components, peripherals, and other devices, may be used to 

BRIEF DESCRIPTION OF THE DRAWINGS implement the present invention. For example, the present 
invention may implemented on lesser or greater numbers of 

Referring now to the drawings in which like reference processors 102 without departing from the scope of the 

numbers represent corresponding parts throughout: 5Q present invention. Further, the number and configuration of 

FIG. 1 is a block diagram illustrating an exemplary the scan 108, sort 110, merge 112, and index build 114 

hardware environment used to implement the preferred computer programs may be altered without departing from 

embodiment of the invention; the scope of the present invention. Finally, the structure of 

FIG. 2 is a flowchart illustrating the general logic of the the index 116, file 118, and/or partitions 120 may be altered 

recover/build index system according to the present inven- 55 without departing from the scope of the present invention. 

FIG. 3 is a dataflow diagram that illustrates the operation 

of a first embodiment of the recover/build index system FIG 2 is a flowchart illustrating the general logic of the 

according to the present invention; and recover/build index system according to the present inven- 

FIG. 4 is a dataflow diagram that illustrates the operation 60 tion. The recover/build index system of the present invention 

of a second embodiment of the recover/build index system bmlds an lndex 116 b y scanning or unloading partitions 120 

according to the present invention. of the file 118 ln P aralleL 



In step 200, concurrent scan programs 108 are used to 
DETAILED DESCRIPTION OF THE scan everv record of each tltl0n 120 xhe scan ams 

PREFERRED EMBODIMENT 6J 108 executing in parallel extract key values (of a particular 

In the following description of the preferred embodiment, key) and record identifiers (rids) or pointers from the par- 
reference is made to the accompanying drawings which titions 120 to create a key/rid or scan stream for each 
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partition 112. While the scan programs 108 are scanning the 
partitions 120, the file 118 can still be read by other 
programs. The parallel scan programs 108 pass the scan 
streams to the sort programs 110. 

In step 202, the sort programs 110 executing in parallel 
receive the scan streams for each partition 120 and create a 
sort stream therefrom, and then pass the sort stream to the 
merge program 112. Each sort program 110 can accept a 
scan stream from one or more scan programs 108. 

In step 204, the merge program 112 merges the sort 
streams received from the sort programs 110 to create a 
merge stream. The merge program 112 accepts the sort 
streams from two or more sort programs 110. The merge 
program 112 then passes the merge stream to an index build 
program 114. 

In step 206, the index build program 114 builds the index 
116 from the merge stream received from the merge program 
112. The index 116 is built in a compact and compressed 
manner, because the key values are sorted. As a result, 
complete blocks or pages of sequential key rid values can he- 
written to the index 116 and the index build program 114 
improves the efficiency of I/O and improves the final orga- 
nization of the index 116. 

Moreover, by performing many of the steps for building 
the index 116 in parallel, the high performance recover/build 
index system reduces the amount of time that it takes to 
build an index 116. Additionally, the recover/build index 
system can exploit the use of multiple processors 102. 
Additionally, piping the data between the sort 110 and merge 
112 programs improves performance. Intermediate commu- 
nication or intermediate files are used to transfer data 
between the sort 110, merge 112, and index build 114 
programs. 

Additionally, after scanning the partitions 120 to extract 
key values and record identifiers, this information is written 
to a file on a data storage device 106. Then, in case of system 
failure, the index 116 can be quickly rebuilt using the steps 
discussed above. 

Dataflow Diagrams 

FIG. 3 is a dataflow diagram that illustrates the operation 
of a first embodiment of the recover/build index system 
according to the present invention. In this embodiment, the 
recover/build index system includes multiple scan programs 
108 that are performed in parallel by multiple processors 
102 against multiple partitions 120 of one file 118 to extract 
key/rid values. Similarly, sort programs 110 are performed 
in parallel to sort the extracted key rid values. Thereafter, 
one or more merge programs 112 are performed to merge the 
key/rid values received from the sort programs 110. Finally, 
an index build program 114 is performed using a single 
stream of key/rid values from the merge programs 112 to 
build the index 116. 

FIG. 4 is a dataflow diagram that illustrates the operation 
of a second embodiment of the recover/build index system 
acci siding to the present invention. In this embodiment of the 
recover/build index system, multiple merge programs 
1 \2a-b are performed in parallel. Again, scan programs 108 
are performed in parallel to retrieve key/rid values. Then, 
multiple sort programs 110 are performed in parallel to sort 
the key/rid values by their key. Next, multiple intermediate 
merge programs 112a are performed on the key/rid values to 
generate merged data streams that are forwarded to a single 
final merge program 112fc that merges all of the key/rid 
values (wherein the merge programs "SHa-b are referred to 
as nested merges). Finally, an index build program 114 is 
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performed to build the index 116 using the merged key/rid 
values. If more than one index 116 is to be built, the above 
steps may be performed on multiple streams of different 
key/rid values. 

Conclusion 

The foregoing description of the preferred embodiment of 
the invention has been presented for the purposes of illus- 
tration and description. It is not intended to be exhaustive or 
to limit the invention to the precise form disclosed. Many 
modifications and variations are possible in light of the 
above teaching. It is intended that the scope of the invention 
be limited not by this detailed description, but rather by the 
claims appended hereto. 
1S What is claimed is: 

1. A computer-implemented method for building an index 
for a database file, the index and file being stored in a data 
storage device coupled to a computer, the method compris- 
ing the steps of: 

20 performing, in the computer, multiple scans in parallel 
against the file, wherein each of the multiple scans 
extracts a desired key value and a record identifier for 
each of the scanned records to create a scan stream; 
performing, in the computer, multiple sorts in parallel 
25 against the scan streams to create multiple sort streams 
of extracted key values and record identifiers; 
performing, in the computer, one or more merges in 
parallel of the multiple sort streams to create a single 
merge stream of extracted key values and record iden- 
30 tifiers; and 

building, in the computer, the index for the file from the 
single merge stream of extracted key values and record 
identifiers. 

2. The method of claim 1, wherein each of the multiple 
35 sorts accepts one or more of the scan streams. 

3. The method of claim 1, wherein each of the merges 
accepts two or more of the sort streams. 

4. The method of claim 1, wherein the step of performing 
one or more merges further comprises performing nested 

40 merges so that there are one or more intermediate merges 
feeding a final merge. 

5. The method of claim 1 wherein the step of performing 
multiple scans extracts multiple key values for each record 
identifier for each of the scanned rows to form multiple scan 

45 streams. 

6. The method of claim 5, wherein the step of performing 
multiple sorts comprises the step of performing multiple 
sorts in parallel against the multiple scan streams to create 
multiple sort streams of extracted key values and record 

50 identifiers. 

7. The method of claim 6, wherein the step of performing 
one or more merges further comprises performing one or 
more merges of the multiple sort streams to create one or 

55 8. The method of claim 7, wherein the step of building the 
index for the file further comprises building multiple 
indexes, wherein each index is based on one of the merge 
streams. 

9. The method of claim 1, wherein the computer com- 
60 prises multiple processors. 

10. The method of claim 1, wherein the step of performing 
multiple scans further comprises writing the scanned key 
values to the data storage device for use in rebuilding the 

65 11. The method of claim 1, wherein the step of building 
the index further comprises the step of writing a compressed 
index. 
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12. An apparatus for building a database index for a 
database table, comprising: 

a computer coupled to a data storage device for storing the 
database table; 

means, performed by the computer, for performing mul- 
tiple scans in parallel against the file, wherein each of 
the multiple scans extracts a desired key value and a 
record identifier for each of the scanned records to 

means, performed by the computer, for performing mul- 
tiple sorts in parallel against the extracted key values 
and record identifiers to create multiple sort streams of 
extracted key values and record identifiers; 

means, performed by the computer, for performing one or 
more merges in parallel of the multiple sort streams to 
create a single merge stream of extracted key values 
and record identifiers; and 

means, performed by the computer, for building the index 
for the file from the single merge stream of extracted 
key values and record identifiers. 

13. An article of manufacture comprising a program 
storage device readable by a computer and tangibly embody- 
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ing one or more programs of instructions executable by the 
computer to perform method steps for building an index for 
a database file, the computer having a data storage device 
coupled thereto for storing the index and the database file, 
5 the method comprising the steps of: 

performing, in the computer, multiple scans in parallel 
against the file, wherein each of the multiple scans 
extracts a desired key value and a record identifier for 
10 each of the scanned records to create a scan stream; 
performing, in the computer, multiple sorts in parallel 
against the scan streams to create multiple sort streams 
of extracted key values and record identifiers; 
performing, in the computer, one or more merges in 
parallel of the multiple sort streams to create a single 
merge stream of extracted key values and record iden- 
tifiers; and 

building, in the computer, the index for the file from the 
20 single merge stream of extracted key values and record 
identifiers. 
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ABSTRACT 



A method, apparatus, and article of manufacture for a 
computer implemented recover/build index system. The 
recover/build index system builds a database index for a 
database file by scanning partitions of the database file in 
parallel to retrieve key values and their associated record 
identifier (rid) values. The recover/build index system then 
sorts the scanned key/rid values for each partition in parallel. 
Next, the recover/build index system performs one or more 
merges on the sorted key/rid values from all of the partitions 
to generate a single key/rid value stream. Finally, the 
recover/build index system builds the index using the single 
key/rid value stream. 

33 Claims, 4 Drawing Sheets 
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12. An apparatus for building a database index for a 
database table, comprising: 

a computer coupled to a data storage device for storing the 
database table; 

means, performed by the computer, for performing mul- 
tiple scans in parallel against the file, wherein each of 
the multiple scans extracts a desired key value and a 
record identifier for each of the scanned records to 



means, performed by the computer, for performing mul- 
tiple sorts in parallel against the extracted key values 
and record identifiers to create multiple sort streams of 
extracted key values and record identifiers; 

means, performed by the computer, for performing one or 
more merges in parallel of the multiple sort streams to 
create a single merge stream of extracted key values 
and record identifiers; and 

means, performed by the computer, for building the index 
for the file from the single merge 
key values and record identifiers. 



ing multiple sorts in parallel against the multiple scan 
streams to create multiple sort streams of extracted key 
values and record identifiers. 

19. The apparatus of claim 18, wherein the means for 
performing one or more merges further comprises means for 
performing one or more merges of the multiple sort streams 
to create one or more merge streams. 

20. The apparatus of claim 19, wherein the means for 
building the index for the file further comprises means for 
building multiple indexes, wherein each index is based on 
one of die merge streams. 

21. The apparatus of claim 12, wherein the computer 
comprises multiple processors. 

22. The apparatus of claim 12, wherein the means for 
performing multiple scans further comprises means for 
writing the scanned key values to the data storage device for 
use in rebuilding the index. 

23. The apparatus of claim 12, wherein the means for 
extracted 20 building the index further comprises the means for writing 

compressed index. 



13. An article of manufacture comprising a program 24. The article of manufacture of claim 13, wherein each 

storage device readable by a computer and tangibly embody- of the multiple sorts accepts one or more of the scan streams, 

ing one or more programs of instructions executable by the 2S - The article of manufacture of claim 13, wherein each 

computer to perform method steps for building an index for as °f the merges accepts two or more of the sort streams, 

a database file, the computer having a data storage device 26. The article of manufacture of claim 13, wherein the 



coupled thereto for storing the index and the database file, 
the method comprising the steps of: 

performing, in the computer, multiple scans in parallel 
against the file, wherein each of the multiple scans 3 
extracts a desired key value and a record identifier for 
each of the. scanned records to create a scan stream; 
performing, in the computer, multiple sorts in parallel 
against the scan streams to create multiple sort s 
of extracted key values and record identifiers; 
performing, in the computer, one or more merges in 
parallel of the multiple sort streams to create a single 
merge stream of extracted key values and record iden- 
tifiers; and 



step of performing one or more merges further comprises 
performing nested merges so that there are one or more 
intermediate merges feeding a final merge, 
o 27. The article of manufacture of claim 13, wherein the 
step of performing multiple scans extracts multiple key 
values for each record identifier for each of the scanned rows 
to form multiple scan streams. 

28. The article of manufacture of claim 27, wherein the 
35 step of performing multiple sorts comprises the step of 

performing multiple sorts in parallel against the multiple 
scan streams to create multiple sort streams of extracted key 
values and record identifiers. 

29. The method of claim 28, wherein the step of perform- 
40 ing one or more merges further comprises performing one or 

building, in the computer, the index for the file from the more merges of the multiple sort streams to create one or 
single merge stream of extracted key values and record more merge streams. 

identifiers. 30. The article of manufacture of claim 29, wherein the 

14. The apparatus of claim 12, wherein each of the step of building the index for the file further comprises 
multiple sorts accepts one or more of the scan streams. 45 building multiple indexes, wherein each index is based on 

15. The apparatus of claim 12, wherein each of the merges one of the merge streams. 

accepts two or more of the sort streams. 31. The article of manufacture of claim 13, wherein the 

16. The apparatus of claim 12, wherein the means for computer comprises multiple processors. 

performing one or more merges further comprises means for 32. The article of manufacture of claim 13, wherein the 
performing nested merges so that the are one or more 50 step of performing multiple scans further comprises writing 
intermediate merges feeding a final merge. the scanned key values to the data storage device for use in 

17. The apparatus of claim 12, wherein the means for rebuilding the index. 

performing multiple scans further comprises means for 33. The article of manufacture of claim 13, wherein the 
extracting multiple key values for each record identifier for step of building the index further comprises the step of 
each of the scanned rows to form multiple scan streams. 55 writing a compressed index. 

18. The apparatus of claim 17, wherein the means for 

performing multiple sorts comprises the means for perform- * * * * * 



